logo logo International Journal of Educational Methodology

IJEM is a leading, peer-reviewed, open access, research journal that provides an online forum for studies in education, by and for scholars and practitioners, worldwide.

Subscribe to

Receive Email Alerts

for special events, calls for papers, and professional development opportunities.

Subscribe

Publisher (HQ)

RHAPSODE LTD
Eurasian Society of Educational Research
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE
RHAPSODE LTD
Headquarters
College House, 2nd Floor 17 King Edwards Road, Ruislip, London, UK. HA4 7AE

'item measurement' Search Results



...

Assessment for Learning (AfL) may be conceptualized as minute-to-minute, day-by-day interactions between learners and teachers with the improvement of learning as the principal focus. This paper traces the development of an AfL measurement instrument (scale) that can be used for research purposes prior to, during and following professional development in the area. Rasch measurement procedures were applied to data drawn from a convenience sample of 594 teachers from 44 elementary schools in Ireland to create a scale consisting of 20 items distributed across four key AfL assessment strategies: learning intentions and success criteria, questioning and classroom discussion, feedback, and peer-and self-assessment.  This scale, the Assessment for Learning Measurement instrument (AfLMi), has good psychometric properties and is interpretable in a way that makes it potentially useful during system wide improvement initiatives focused on AfL.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.3.2.103
Pages: 103-115
cloud_download 1414
visibility 1683
5
Article Metrics
Views
1414
Download
1683
Citations
Crossref
5

Scopus

...

This research was conducted to investigate the predictive role of homophobia and unconditional self-acceptance on respect of differences in psychological counselor candidates. Participants were 239 psychological counselor candidates. The Respect of Differences Scale, the Homophobia Scale, and the Unconditional Self-Acceptance Scale were used to collect the data. Path analysis was used to determine the influences of variables on respect of differences. The independent sample t-test and one-way ANOVA were used to determine differences between participants in terms of gender and grade. The results of the analysis indicated that homophobia and unconditional self-acceptance are predictors of respect of differences, and place of living and traditionally have an indirect effect on respect of differences. In addition, female participants reported a higher level of respect of differences than male participants. Similarly, first year college students reported a higher level of respect of differences than fourth year college students.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.1.59
Pages: 59-70
cloud_download 493
visibility 875
2
Article Metrics
Views
493
Download
875
Citations
Crossref
2

Scopus

...

The aim of this study is to compare 2018 Science Course Curriculum (SCC), 2015 Trends in International Mathematics and Science Study (TIMSS) and 2018 High School Entrance Examination (HSE) in terms of content domains, cognitive domains and learning objectives. Qualitative research method, was used in this study. Data were analyzed using document review matrices to determine the similarities and differences between the objectives of SCC, TIMSS and HSE. SCC outcomes and HSE science questions were also classified according to TIMSS cognitive domains. Results show that the learning objectives of the fields of Physics, Biology and Earth Sciences of TIMSS are compatible with those of all grade levels of SCC and that the objectives of Chemistry are compatible with those of the seventh and eighth grades. Most of HSE questions are compatible with the objectives of SCC, however, the latest revision in the curriculum has introduced some eighth grade objectives to other grade levels. HSE science questions measure higher-level skills than TIMSS science questions. The subject domain of the “Organisms and Life” of SCC has the most learning objectives in the levels of “knowing” and “reasoning” while the subject domain of the “Physical Events” has the most learning objectives in the levels of “applying.” Besides, the seventh-, fifth- and eighth-graders have the most objectives in the levels of “knowing,” “applying,” and “reasoning,” respectively. It is hoped that the results will contribute the literature in improvement of science curricula and interpretation of national and international exams.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.3.433
Pages: 433-449
cloud_download 846
visibility 1099
2
Article Metrics
Views
846
Download
1099
Citations
Crossref
2

Scopus

...

Teacher-made tests (TMT) are the most used instruments for assessment and evaluation. This study investigates the cognitive requirements, test construction errors, and item types of TMTs. Content analysis technique is used in order to analyze and classify TMT items based on TIMSS-2019 assessment framework and based on criteria that is constructed to determine test construction errors. The data is consisted of 548 items in 30 exam papers of 18 mathematics teachers from 13 distinct schools. The distribution of TIMSS-2019 cognitive demands of all TMTs indicates that there is a strong emphasis on knowing or applying cognitive domains, with a total percentage of 93. Since 83% of all questions are of multiple choice and 17% are constructed-response type, teachers mostly prefer multiple choice item type. Findings also reveal that except face validity, there are errors concerning test constructions. Consequently, it is suggested that teachers should give more care on preparing items of higher cognitive levels, on tests of mixed type items, and on tests that involve lesser construction errors for more reliable tests. Finally, it is also suggested that measurement and evaluation specialists should be employed in each school or in each local Ministry of National Education Authority at least, in order to support teachers, but if this is not possible in a close time, there must be in-service training programs on measurement and evaluation for teachers to participate in.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.3.479
Pages: 479-488
cloud_download 344
visibility 708
3
Article Metrics
Views
344
Download
708
Citations
Crossref
3

Scopus

...

This study describes the development and validation of a psychometrically-sound instrument, the Active Learning Strategies Inventory (ALSI), designed to measure learners’ perceptions of their active learning strategies within an active learning context. Active learning encompasses a broad range of pedagogical practices and instructional methods that connect with an individual learner's active learning strategies. In order to fulfill the study's goals, a conceptual framework on learners’ active learning strategies was developed and proposed, drawing upon the research literature on active learning. The development and construct validation of the Active Learning Strategies Inventory (ALSI), based on the conceptual and methodological underpinnings, involved identifying five scales of learners’ active learning strategies: engagement, cognitive processing, orientation to learning, readiness to learn and motivational orientation. An item pool of 20 items was generated following an extensive review of the literature, standardized card sorting procedures including confirmatory factor analysis and scale validation of a pilot (n = 407) survey. The ALSI scale demonstrated strong internal consistency and reliability with a Cronbach's alpha ranging from 0.81 to 0.87. High item loading scores from the factor analysis provided initial support for the instrument's construct validity of the five-factor model. The ALSI scale provides a reliable and valid method for researchers and academicians who wish to measure learners' perceptions of their active learning strategies within an active learning context. Finally, we discuss the implications and address the limitations and directions for future research.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.1.201
Pages: 201-223
cloud_download 999
visibility 1171
5
Article Metrics
Views
999
Download
1171
Citations
Crossref
5

Scopus
4

...

The purpose of this study is to examine the mediator role of cognitive flexibility and difficulties in emotion regulation in the relationship between resilience and distress tolerance amongst college students. The sample of the study involved 1114 students (771 females, 343 males) from various universities in Turkey. The mean age of the sample was 20.65 (Sd=2.77). The Resilience Scale, Distress Tolerance Scale, Cognitive Flexibility Scale, and Difficulties in Emotion Regulation Scale (DERS) had been used to collect data. In this study, a Serial Multiple Mediation Model was used, as proposed by Hayes. The findings showed that people with a higher level of distress tolerance possess higher degrees of cognitive flexibility and that cognitively more flexible individuals experience less difficulty in emotion regulation, and thus, lower levels of difficulty in emotion regulation were associated with an increase in resilience. Furthermore, the model in its entirety had proven to be statistically significant, accounting for 42% of the total variance.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.5.4.525
Pages: 525-533
cloud_download 3582
visibility 4104
41
Article Metrics
Views
3582
Download
4104
Citations
Crossref
41

Scopus
33

...

Pearson product–moment correlation coefficient between item g and test score X, known as item–test or item–total correlation (Rit), and item–rest correlation (Rir) are two of the most used classical estimators for item discrimination power (IDP). Both Rit and Rir underestimate IDP caused by the mismatch of the scales of the item and the score. Underestimation of IDP may be drastic when the difficulty level of the item is extreme. Based on a simulation, in a binary dataset, a good alternative for Rit and Rir could be the Somers’ D: it reaches the ultimate values +1 and –1, it underestimates IDP remarkably less than Rit and Rir, and, being a robust statistic, it is more stable against the changes in the data structure. Somers’ D has, however, one major disadvantage in a polytomous case: it tends to underestimate the magnitude of the association of item and score more than Rit does when the item scale has four categories or more.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.1.207
Pages: 207‒221
cloud_download 1093
visibility 1327
16
Article Metrics
Views
1093
Download
1327
Citations
Crossref
16

Scopus

...

Kelley’s Discrimination Index (DI) is a simple and robust, classical non-parametric short-cut to estimate the item discrimination power (IDP) in the practical educational settings. Unlike item–total correlation, DI can reach the ultimate values of +1 and ‒1, and it is stable against the outliers. Because of the computational easiness, DI is specifically suitable for the rough estimation where the sophisticated tools for item analysis such as IRT modelling are not available as is usual, for example, in the classroom testing. Unlike most of the other traditional indices for IDP, DI uses only the extreme cases of the ordered dataset in the estimation. One deficiency of DI is that it suits only for dichotomous datasets. This article generalizes DI to allow polytomous dataset and flexible cut-offs for selecting the extreme cases. A new algorithm based on the concept of the characteristic vector of the item is introduced to compute the generalized DI (GDI). A new visual method for item analysis, the cut-off curve, is introduced based on the procedure called exhaustive splitting.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.2.237
Pages: 237 - 258
cloud_download 831
visibility 1008
6
Article Metrics
Views
831
Download
1008
Citations
Crossref
6

Scopus

...

A new index of item discrimination power (IDP), dimension-corrected Somers’ D (D2) is proposed. Somers’ D is one of the superior alternatives for item–total- (Rit) and item–rest correlation (Rir) in reflecting the real IDP with items with scales 0/1 and 0/1/2, that is, up to three categories. D also reaches the extreme value +1 and ‒1 correctly while Rit and Rir cannot reach the ultimate values in the real-life testing settings. However, when the item has four categories or more, Somers’ D underestimates IDP more than Pearson correlation. A simple correction to Somers’ D in the polytomous case seems to lead to be effective in item analysis settings.  In the simulation with real-life items, D2 showed very few cases of obvious underestimation and practically no cases of obvious overestimation. With certain restrictions discussed in the article, D2 seems to be a good alternative for these classic estimators not only with dichotomous items but also with the polytomous ones. In general, the magnitudes of the estimates by D2 are higher than those by Rit, Rir, and polychoric correlation and they seem to be close of those of bi- and polyserial correlation coefficients without out-of-range values.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.2.297
Pages: 297-317
cloud_download 364
visibility 815
8
Article Metrics
Views
364
Download
815
Citations
Crossref
8

Scopus

...

Progress monitoring of academic achievement is an essential element to prevent learning disorders. A prominent approach is curriculum-based measurement (CBM). Various studies have documented positive effects of CBM on students’ achievement. Nevertheless, the use of CBM is associated with additional work for teachers. The use of tablets may be of help here. Yet, although many advantages of computer- or tablet-based assessments are being discussed in the literature (e. g. innovative item formats, adaptive testing, automated scoring and feedback), there are still concerns regarding the comparability of different assessment modes (paper-pencil vs. tablet). In the study presented, we analyze the CBM data of 98 fourth graders. They processed the exact same computation items once with paper and pen and once in a tablet application. The analyses point to comparable results in the test modes, although some significant deviations can be found at item level. In addition, the children report perceived benefits when working with the tablet.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.6.4.669
Pages: 669-680
cloud_download 1030
visibility 1147
13
Article Metrics
Views
1030
Download
1147
Citations
Crossref
13

Scopus

...

Although Goodman–Kruskal gamma (G) is used relatively rarely it has promising potential as a coefficient of association in educational settings.  Characteristics of G are studied in three sub-studies related to educational measurement settings. G appears to be unexpectedly appealing as an estimator of association between an item and a score because it strictly indicates the probability to get a correct answer in the test item given the score, and it accurately produces perfect latent association irrespective of distributions, degrees of freedom, number of tied pairs and tied values in the variables, or the difficulty levels in the items. However, it underestimates the association in an obvious manner when the number of categories in the item is more than four. Towards this, a dimension-corrected G (G2) is proposed and its characteristics are studied. Both G and G2 appear to be promising alternatives in measurement modelling settings, G with binary items and G2 with binary, polytomous and mixed datasets.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.1.95
Pages: 95-118
cloud_download 870
visibility 813
9
Article Metrics
Views
870
Download
813
Citations
Crossref
9

Scopus

...

This study aims to produce empirical evidence of the validity and reliability of instrument items for the competency framework of agricultural teaching staff in Malaysian agricultural vocational colleges. The validity and reliability of the framework were analyzed using Rasch Model Measurement assisted by Winsteps 3.72 software. This research instrument contained 116 items, which was distributed to 30 instructors at the Teluk Intan Agricultural Vocational College, Malaysia. The selection of respondents was made by strata random where the researcher makes the strata of the population according to the percentage and then selects randomly based on the desired percentage. Validity analysis of the instrument was done through four functional testings. For reliability and separation of respondents, it was found that the individual reliability value was very good and acceptable. The results of the item polarity analysis detected no negative value (-) in the Point Measure Correlation value. Item matching analysis found that 11 items had to be dropped as they failed to meet the required conditions. From the analysis on local dependence that determines dependent items based on the standardized residual correlation value, it was discovered that the correlation value for the items used was detected; 13 items need attention. The results of the data analysis checking the functionality of the items suggested that some items should be dropped. The omission of these items has provided evidence that the instrument of competence of agricultural instructors is crucial to have a high level of validity and reliability for use in actual studies.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.7.3.411
Pages: 411-420
cloud_download 318
visibility 557
2
Article Metrics
Views
318
Download
557
Citations
Crossref
2

Scopus
2

...

This study reviews 60 papers using a Likert scale and published between 2012 – 2021. Screening for literature review uses the PRISMA method. The data analysis technique was carried out through data extraction, then synthesized in a structured manner using the narrative method. To achieve credible research results at the stage of the data collection and data analysis process, a group discussion forum (FGD) was conducted. The findings show that only 10% of studies use a measurement scale with an even answer choice category (4, 6, 8, or 10 choices). In general, (90%) of research uses a measurement instrument that involves a Likert scale with odd response choices (5, 7, 9, or 11) and the most popular researchers use a Likert scale with a total response of 5 points. The use of a rating scale with an odd number of responses of more than five points (especially on a seven-point scale) is the most effective in terms of reliability and validity coefficients, but if the researcher wants to direct respondents to one side, then a scale with an even number of responses (six points) is possible. more suitable. The presence of response bias and central tendency bias can affect the validity and reliability of the use of the Likert scale instrument.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.8.4.625
Pages: 625-637
cloud_download 1354
visibility 2259
9
Article Metrics
Views
1354
Download
2259
Citations
Crossref
9

Scopus
4

Rethinking the Components of Regulation of Cognition through the Structural Validity of the Meta-Text Test

metacognition performance-based testing regulation of cognition structural validity

Marcio Alexander Castillo-Diaz , Cristiano Mauro Assis Gomes , Enio Galinkin Jelihovschi


...

The field of studies in metacognition points to some limitations in the way the construct has traditionally been measured and shows a near absence of performance-based tests. The Meta-Text is a performance-based test recently created to assess components of cognition regulation: planning, monitoring, and judgment. This study presents the first evidence on the structural validity of the Meta-Text, by analyzing its dimensionality and reliability in a sample of 655 Honduran university students. Different models were tested, via item confirmatory factor analysis. The results indicated that the specific factors of planning and monitoring do not hold empirically. The bifactor model containing the general cognition regulation factor and the judgment-specific factor was evaluated as the best model (CFI = .992; NFI = .963; TLI = .991; RMSEA = .021). The reliability of the factors in this model proved to be acceptable (Ω = .701 & .699). The judgment items were well loaded only by the judgment factor, suggesting that the judgment construct may actually be another component of the metacognitive knowledge dimension but having little role in cognition regulation. The results show initial evidence on the structural validity of the Meta-Text and give rise to information previously unidentified by the field which has conceptual implications for theorizing metacognitive components.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.8.4.687
Pages: 687-698
cloud_download 295
visibility 596
2
Article Metrics
Views
295
Download
596
Citations
Crossref
2

Scopus
1

Graded Response Models on the Curiosity Measurement of Elementary School Students

curiosity measurement elementary school graded response models

Herwin Herwin , Riana Nurhayati , Aprilia Tina Lidyasari , Augusto da Costa


...

Curiosity is one of the most important characters for elementary school students. However, the facts in the field show that the measurement model used by the teacher to identify the student's curiosity is not yet available in a standardized manner. This study aims to develop a model for measuring the curiosity of elementary school students using the graded response model (GRM) approach. This research uses quantitative method with descriptive type. The research sample used was 236 elementary school students who were randomly selected. Data were collected using a questionnaire of 16 statement items using a Likert scale approach. The data were analyzed using the response item theory approach with the GRM. The results showed that the model for measuring student curiosity in elementary schools had good location parameters, a good discriminant index, a fairly good information function with a small estimation error. The curiosity measurement model in this study can be used as an alternative for teachers to identify students' curiosity in elementary schools.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.1.53
Pages: 53-62
cloud_download 302
visibility 543
0
Article Metrics
Views
302
Download
543
Citations
Crossref
0

Scopus
0

...

The purpose of this systematic literature review (SLR) is to identify: (a) the topic of the study, (b) the research methods used, and (c) the results of research on Mathematics education in Malaysia. This study discusses the use of teaching aid (TA) in the field of syllabus and geometry for Form 2 students. The use of TA is considered highly successful and relevant for educators to improve the quality of the teacher’s instructions and students’ understanding. Therefore, using the rules of optional reporting items for systematic review and meta-analysis (PRISMA) by Moher et al. (2015), a review system was carried out to determine the appropriate strategies and variables for the field. Four stages constitute the PRISMA paradigm used in this study: identification, screening, qualification, and admission. Using criteria opted by researchers from multiple searches, including Google Scholar, Researchgate, Scopus, and Emerald, over 20 papers were identified for additional investigation. The data were then analysed quantitatively to describe the research's findings. From the results, two main research themes were found, namely (a) learning to use TA; and (b) the field of measurement and geometry of Mathematics. The results of the article analysis indicate that Mathematics education in Malaysia is currently at a moderate level and is ineffective at fostering students' understanding and interest. These results are anticipated to serve as the foundation for teachers, students, schools, and the Ministry of Education to undertake more engaging and interactive learning, particularly in the subject areas of mathematics and geometry.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.2.387
Pages: 387-396
cloud_download 197
visibility 504
0
Article Metrics
Views
197
Download
504
Citations
Crossref
0

Scopus
0

Validation of the Adolescent Social Identity Measure: Adolescents’ Perception of Themselves in a Social Context

adolescents confirmatory factor analysis social identity validation

Annemaree Carroll , Julie M. Bower , Jenny Povey , Sandy Muspratt , Holly Chen


...

Social identity is an important social determinant of student outcomes such as mental health and well-being. Currently, no validated social identity measures exist for adolescents in secondary school settings. A new ‘Adolescent Social Identity’ measure was developed by adapting two social identity dimensions from a validated reputation enhancement scale. The Social Identity Measure comprises two scales of 10 items each to measure how adolescents think their peers view them (e.g., reputational status) in terms of their conforming and nonconforming behaviour (Self-perception of Public Self) and how adolescents would ideally like to be viewed (Ideal Public Self) by peers. Exploratory and confirmatory factor analyses were conducted along with assessments of reliability, validity, and measurement invariance. Conforming and Nonconforming subscales for both scales were shown to be reliable, valid, and invariant across age and gender groupings. There were significant but small differences in the latent means for gender.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.9.3.551
Pages: 551-565
cloud_download 210
visibility 371
0
Article Metrics
Views
210
Download
371
Citations
Crossref
0

Scopus
0

...

The role of artificial intelligence (AI) in education remains incompletely understood, demanding further evaluation and the creation of robust assessment tools. Despite previous attempts to measure AI's impact in education, existing studies have limitations. This research aimed to develop and validate an assessment instrument for gauging AI effects in higher education. Employing various analytical methods, including Exploratory Factor Analysis, Confirmatory Factor Analysis, and Rasch Analysis, the initial 70-item instrument covered seven constructs. Administered to 635 students at Nueva Ecija University of Science and Technology – Gabaldon campus, content validity was assessed using the Lawshe method. After eliminating 19 items through EFA and CFA, Rasch analysis confirmed the construct validity and led to the removal of three more items. The final 48-item instrument, categorized into learning experiences, academic performance, career guidance, motivation, self-reliance, social interactions, and AI dependency, emerged as a valid and reliable tool for assessing AI's impact on higher education, especially among college students.

description Abstract
visibility View cloud_download PDF
10.12973/ijem.10.2.997
Pages: 197-211
cloud_download 37
visibility 128
0
Article Metrics
Views
37
Download
128
Citations
Crossref
0

Scopus
0

...